A fully data-driven method to identify (correlated) changes in diachronic corpora

نویسنده

  • Alexander Koplenig
چکیده

In this paper, a method for measuring synchronic corpus (dis-)similarity put forward by Kilgarriff (2001) is adapted and extended to identify trends and correlated changes in diachronic text data, using the Corpus of Historical American English (Davies 2010a) and the Google Ngram Corpora (Michel et al. 2010a). This paper shows that this fully data-driven method, which extracts word types that have undergone the most pronounced change in frequency in a given period of time, is computationally very cheap and that it allows interpretations of diachronic trends that are both intuitively plausible and motivated from the perspective of information theory. Furthermore, it demonstrates that the method is able to identify correlated linguistic changes and diachronic shifts that can be linked to historical events. Finally, it can help to improve diachronic POS tagging and complement existing NLP approaches. This indicates that the approach can facilitate an improved understanding of diachronic processes in

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatically Identifying Instances of Change in Diachronic Corpus Data

With the increasing availability of diachronic corpora, machine-aided identification of linguistic items that have undergone significant change is set to become an important task. This importance is heightened further if, as Hilpert and Gries (2009:386) have argued, approaching linguistic change in a data-driven manner can reveal otherwise unnoticed phenomena. Key to this endeavour is being abl...

متن کامل

Finding Developmental Groups in Acquisition Data: Variability-based Neighbour Clustering

This article introduces a quantitative, data-driven method to identify clusters of groups of data points in longitudinal data. We illustrate this method with examples from firstlanguage acquisition research. First, we discuss a variety of shortcomings of current practices in the identification and handling of stages in studies of language acquisition. Second, we explain and exemplify our method...

متن کامل

A Comparative Study of Metaphorical Markers in Academic Research Articles

Although the use of metaphorical markers in corpora has been studied to a largeextent (e.g., Glucksberg & Keysar 1993; Skorczynska & Deignan, 2006; Sznjder,2005), no attempt to the best of the researchers' knowledge has been made todescribe metaphorical marking in a comparative analysis of 2 corpora in bothnational and international journals of applied linguistics in Iran. The gap envisagedhas ...

متن کامل

Assessing frequency changes in multistage diachronic corpora: Applications for historical corpus linguistics and the study of language acquisition

The use of corpora that are divided into temporally ordered stages is becoming increasingly wide-spread in historical corpus linguistics. This development is partly due to the fact that more and more resources of this kind are being developed. Since the assessment of frequency changes over multiple periods of time is a relatively recent practice, there are few agreed-upon standards of how such ...

متن کامل

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1508.06374  شماره 

صفحات  -

تاریخ انتشار 2015